Skip to content

feat: ContextGC tool — explicit garbage collection of tool results and scratchpad notes#55

Draft
Simon-Free wants to merge 11 commits intoSafeRL-Lab:mainfrom
Simon-Free:pr8-context-gc
Draft

feat: ContextGC tool — explicit garbage collection of tool results and scratchpad notes#55
Simon-Free wants to merge 11 commits intoSafeRL-Lab:mainfrom
Simon-Free:pr8-context-gc

Conversation

@Simon-Free
Copy link
Copy Markdown

@Simon-Free Simon-Free commented Apr 17, 2026

Changes (updated)

  • context_gc.py: Complete rewrite with NoteSave/NoteRead tools, methodology note protection, compact_xml support, verbatim audit with tool arg summaries, strip_trashed_stubs
  • agent.py: Added notes_timeline tracking, _state config binding
  • tools/init.py: Added NoteSave, NoteRead, enhanced ContextGC schemas (compact_xml, detailed snippet types)
  • tool_registry.py: Added is_concurrent_safe() helper

Tests

  • 28 tests covering all GC operations, methodology protection, NoteSave/NoteRead, stub detection, snippet trimming, audit generation

Port of bouzecode context_gc package (flat-file adaptation).

@Simon-Free Simon-Free marked this pull request as draft April 17, 2026 19:46
Simon FREYBURGER and others added 9 commits April 18, 2026 09:30
Remove compact_assistant_xml, compact_assistant_xml_selective, _xml_replacer,
_build_tc_lookup and _TOOL_USE_RE. These functions compact inline
<tool_use name=... id=...>...</tool_use> XML blocks inside assistant message
content, which only exist on providers that don't natively support
tool_use blocks (e.g. AWS Bedrock socle in bouzecode). Upstream cheetahclaws
uses the native Anthropic content: [{"type":"tool_use", ...}] format, so
these functions early-returned on every call and the compact_tool_history
branch that invoked compact_assistant_xml was a no-op.

Also fix _apply_context_gc which was wrapped in a double try/except where
the outer pass was unreachable, and which imported only apply_gc while
referencing inject_notes and prepend_verbatim_audit (NameError when
gc_state had entries). Replaced with a single try that imports all three
names and cleanly returns on ImportError if PR SafeRL-Lab#55 isn't deployed
alongside.

Test file drops the TestCompactAssistantXml / TestCompactAssistantXmlSelective
classes that exercised the removed functions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add gc_state (trashed_ids, snippets, notes) as a real field on AgentState,
serialize it in _build_session_data, and rehydrate it via a new helper
_restore_state_from_data that is shared by cmd_load / cmd_resume /
cmd_cloudsave load.

Without this, any /save followed by /load silently drops trashed_ids: the
tool_results previously elided by ContextGC re-materialize in the next
turn's context window, leaking tens of thousands of tokens on long
sessions. Tests cover save/load roundtrip and independence of gc_state
instances across AgentState instances.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove the compact_xml field from GCState, the compact_xml parameter from
the ContextGC tool schema, and the XML-compaction branch of apply_gc. The
branch used to dynamically import compact_assistant_xml /
compact_assistant_xml_selective from followup_compaction, but those only
match <tool_use name="X" id="Y">...</tool_use> strings inside assistant
text content -- a shape that only appears on providers without native
tool_use support (e.g. AWS Bedrock socle in bouzecode). Upstream
cheetahclaws emits native Anthropic content blocks, so the XML branch was
an unreachable no-op. The branch also had a latent NameError
(compact_assistant_xml_selective was imported under the wrong name),
which is why 2 existing tests were red against this branch.

apply_gc is now a 3-line list comprehension dispatching to
_apply_gc_to_message, which in turn delegates to _stub_trashed_tool_result
and _apply_snippet_to_message. Each helper fits on one screen, names its
intent, and no longer hides behavior behind a dead early-return chain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce a generic config['disabled_tools'] list honoured by the tool
registry in two places:

- get_tool_schemas(disabled=...) filters disabled names out of the
  schema list sent to the LLM; the model never learns the tool exists.
- execute_tool(...) defense-in-depth: any tool_call whose name is
  disabled returns an explicit error tool_result instead of running.

agent.py passes config['disabled_tools'] to get_tool_schemas per turn.
Callers that set disabled_tools=['ContextGC'] now get pre-SafeRL-Lab#55
behaviour with the rest of this PR in place -- which is what makes
the ContextGC tool truly opt-out rather than an implicit behaviour
change for every existing integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
agent.run builds a fresh per-turn config dict at the top of the function.
It adds _depth and _system_prompt so tools like Agent can read them, but
forgot to add _gc_state. As a result every ContextGC invocation returned
"Error: no GC state available" and trashed_ids was never mutated in
production.

Add "_gc_state": state.gc_state to the merge. Because state.gc_state is
the same object across turns and is persisted in /save, ContextGC can now
read and mutate it, and its effects carry over a /load.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three scenarios, each driving agent.run through a multi-turn conversation
where only providers.stream is replaced by a scripted generator. All
tools (echo + ContextGC) execute for real through the registry.

- test_llm_trashes_tool_result_via_contextgc_end_to_end: LLM issues echo,
  then ContextGC(trash=[echo_id]); asserts state.gc_state.trashed_ids.
- test_gc_state_survives_save_and_reload_via_session_helpers: same setup
  + _build_session_data → JSON → _restore_state_from_data roundtrip,
  asserts trashed_ids still present after restore.
- test_disabled_tools_hides_contextgc_schema_from_llm: confirms that
  config['disabled_tools']=['ContextGC'] removes the schema from the list
  sent to the stream, proving backwards-compatibility without touching
  the registry.

These cover the integration points unit tests can't see: tool
registration, config injection of _gc_state in agent.run, and the
schema-filter path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… methodology protection, stub detection, audit improvements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant